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This paper analyzes correlations in patterns of trading of different members of 
the London Stock Exchange. The collection of strategies associated with a member 
institution is defined by the sequence of signs of net volume traded by that insti- 
tution in hour intervals. Using several methods we show that there are significant 
and persistent correlations between institutions. In addition, the correlations are 
structured into correlated and anti-correlated groups. Clustering techniques using 
the correlations as a distance metric reveal a meaningful clustering structure with 
two groups of institutions trading in opposite directions. 
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I. INTRODUCTION 

The aim of this paper is to examine the 
heterogeneity of the trading strategies asso- 
ciated with different members of the London 
Stock Exchange (LSE). This is made possi- 
ble by a dataset that includes codes identi- 
fying which member of the exchange placed 
each order. While we don't know who the 
member actually is, we can link together the 
trading orders placed by the same member. 
Member firms can be large investment banks, 
in which case the order-flow associated with 
the code will be an aggregation of various 
strategies used by the bank and its clients, 
or at the other extreme it can be a single 
hedge fund. Thus, while we cannot iden- 
tify patterns of trading at the level of indi- 
vidual trading strategies, we can test to see 
whether there is heterogeneity in the collec- 
tions of strategies associated with different 
members of the exchange. For convenience 
we will refer to the collection of strategies 
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associated with a given member of the ex- 
change as simply a strategy and the member 
of the exchange as simply an institution, but 
it should be borne in mind throughout that 
"a strategy" is typically a collection of strate- 
gies, which may reflect the actions of several 
different institutions, and thus may internally 
be quite heterogeneous. 

We define a strategy by its actions, i.e. by 
the net trading of an institution function 
of time. If the net volume traded by an insti- 
tution in a period of time is positive there is 
an net imbalance of buying, and conversely, 
if the net volume is negative there is a net 
imbalance of selling. We test to see whether 
two strategies are similar in terms of their 
correlations in the times when they are net 
buyers and when they are net sellers. 

Two studies similar in spirit to this one 
are [5l [TT] in which the authors analyse trad- 
ing strategies using data from the Spanish 
Stock Exchange. A number of related studies 
analysing market correlations can be found 

inlSlEliaEllTllHlElllo]. 

A. The LSE dataset 

The LSE is a hybrid market with two trad- 
ing mechanisms operating in parallel. One 
is called the on-book or "downstairs" mar- 
ket and operates as an anonymous electronic 
order book employing the standard continu- 
ous double auction. The other is called the 
off-book or "upstairs" market and is a bilat- 
eral exchange where trades are arranged via 
telephone. We analyse the two markets sep- 
arately. 

The market is open from 8:00 to 16:30, 
but for this analysis we discarded data from 
the first hour (8:00 - 9:00) and last half hour 
of trading (16:00 - 16:30) in order to avoid 
possible opening and closing effects. 

We base the analysis on four stocks, Voda- 
fone Group (VOD, telecomunications), As- 
traZeneca (AZN, pharmaceuticals), Lloyds 
TSB (LLOY, insurance) and Anglo American 
(AAL, mining). We chose VOD as it is one 



of the most liquid stocks on the LSE. LLOY 
and AZN are examples of frequently traded 
liquid stocks, and AAL is a low volume illiq- 
uid stock. 



B. Measuring correlations between 
strategies 

The institution codes we use in this anal- 
ysis are re-scrambled by the exchange each 
month for privacy reasons^. This naturally 
divides the dataset into monthly intervals 
which we treat as independent samples. The 
data spans from September 1998 to May 
2001, so we have 32 samples for each stock. 

In order to define the trading strategies 
we further divide the monthly samples into 
hourly intervals. We believe that one hour 
is a reasonable choice, capturing short time 
scale intraday variations, but also providing 
some averaging to reduce noise. For each of 
these hour intervals and for each institution 
individually, we calculate the net traded vol- 
ume in monetary units (British Pounds). Net 
volume is total buy volume minus total sell 
volume. We then assign to each institution 
and hour interval a +1, -1 or describing it's 
strategy in that interval. If the net volume 
in an interval is positive (the institution in 
that period is a net buyer) we assign it the 
value +1. If the institution's net volume in 
the interval is negative (the institution is a 
net seller) we assign it the value -1. If the in- 
stitution is not active within the interval we 
assign it the value 0. We discard institutions 
that are not active for more than 1/3 of the 
time. 

Three examples of trading strategies are 
shown in figure [T} The examples show cumu- 
lated trading strategies for three institutions 



However, we have found a way to track institu- 
tions' trading on the on-book market for longer 
time periods. We use this fact in a subsequent 
part of the paper. More about this later in the 
text. 
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trading Vodafone on-book in the month of 
November 2000. 

In the end we obtain for each month 
of trading a set of time series representing 
the net trading direction for each institution 
Xi{t), which can be organized in a 'strategy 
matrix' M with dimensions N x T, where 
N is the number of active institutions and 
T is the number of hour intervals in that 
month. The number of active institutions 
varies monthly and between stocks. Typical 
value of N for the on-book market for liquid 
stocks is around ~ 70, and for less liq- 
uid stocks ~ 40. For the off-book market, 
the numbers are 1.5 to 2 times larger. The 
number of hourly intervals depends on the 
number of working days in a month, and is 
around T = 7 x 20 = 140. 

Given the monthly strategy matrices M 
we then construct the N x N monthly cor- 
relation^ matrices between the institutions' 
strategies. A color example of a correlation 
matrix for off-book trading in Vodafone in 
November 2000 is given in the top panel of 
figure [2] Dark colors represent high abso- 
lute correlations, with red positive and blue 
negative. Since the ordering of institutions 
is arbitrary we use the ordering suggested by 
a clustering algorithm as explained later in 
the text. It is visually suggestive that the 
correlations are not random: Some groups of 
institutions are strongly anticorrelated with 
the rest while in turn being positively corre- 
lated among themselves. 

A formal test of significance involving the 
t-test cannot be used as it assumes normally 
distributed disturbances, whereas we have 
discrete ternary values. Later in the text we 
use a bootstrap approach to test the signifi- 
cance. Now, however, we test the significance 
of the correlation coefficients using a stan- 
dard algorithm as in ref. [1]. The algorithm 
calculates the approximate tail probabilities 



^ Since the data assumes only three distinct values 
(0,1 and -1) Pearson and Spearman correlations 
are equivalent. 



for Spearman's correlation coefficient p. Its 
precision unfortunately degrades when there 
are ties in the data, which is the case here. 
With this caveat in mind, preliminary 
test, we find that, for example, for on-book 
trading in Vodafone for the month of May 
2000, 10.3% of all correlation coefficients are 
significant at the 5% level. Averaging over all 
stocks and months, the average percentage of 
significant coefficients for on-book trading is 
10.5% ± 0.4%, while for off-book trading it 
is 20.7% ± 1.7%. Both of these averages are 
substantially higher than the 5% we would 
expect randomly with a 5% acceptance level 
of the test. 

II. SIGNIFICANCE AND 
STRUCTURE IN THE CORRELATION 
MATRICES 

The preliminary result of the previous sec- 
tion that some correlation coefficients are 
non-random is further corroborated by test- 
ing for non-random structure in the correla- 
tion matrices. The hypothesis that there is 
structure in the correlation matrices contains 
the weaker hypothesis that some coefficients 
are statistically significant. 

The test for structure in the matrices 
would involve multiple joint tests for the sig- 
nificance of the coefficients. An alternative 
method, however, is to examine the eigen- 
value spectrum of the correlation matrices. 
Intuitively, one can understand the relation 
between the two tests by remembering that 
eigenvalues A are roots of the characteristic 
equation det(74 — Al) = 0, and that the deter- 
minant is a sum of permutations of products 
of the matrix elements det{A) = J^Tr^-K^nCin, 
where tt are the permutations and is the 
Levi-Civita antisymmetric tensor. On the 
other hand the test is directly related to prin- 
cipal component analysis, as the eigenvalues 
of the correlation matrix determine the prin- 
cipal components. 

The existence of empirical eigenvalues 
larger than the values expected from the null 
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FIG. 1: Three examples of institutions' strategies for on-book trading in Vodafone. We plot the 
cumulative sum of the (+1 , -1 and 0) indicators of hourly net trading volume within a month. The 
three institutions were not chosen randomly but rather to illustrate three very different trading 
styles. Institution 2598 appears to be building up a position, institution 3733 could be acting as 

a market maker, while 3463 seems to be a mix of the two. In reality, only a small number of 
institutions show strong autocorrelations in their strategy (such as the top and bottom institutions 
in the plot) and do not have such suggestive cumulative plots. 



implies that there is structure in the correla- 
tion matrices and the coefficients are signifi- 
cant. 



A. Density of the correlation matrix 
eigenvalue distribution 

For a set of infinite length uncorrelated 
time series all eigenvalues of the correspond- 
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Block market 




FIG. 2: Example of a rearranged correlation matrix for off-book market trading in Vodafone 
in November 2000. The ordering of institutions is based on the result of the clustering algorithm 
explained in section II C Red colors represent positive correlations between institutions' strategies, 
blue represent negative correlations, and darker colors are larger correlations. The dendrogram 
resulting from the clustering is shown below the correlation matrix. For visual clarity the cluster 
is cut at height 1.4. 
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ing correlation matrix (which in this case 
would be diagonal) are equal to 1. For finite 
length time series, however, even if the un- 
derlying generating processes are completely 
uncorrelated, the eigenvalues will not exactly 
be equal to one - there will be some scatter 
around one. This scattering is described by a 
result from random matrix theory [3l HI [10] . 
For random uncorrelated variables, each of 
length T, in the limit T ^ oo and N ^ oo 
while keeping the ratio Q = T/N > 1 fixed, 
the density of eigenvalues p{\) of the corre- 
lation matrix is given by the functional form 

/ , N Q y i^max — A) (A — Xmin) 

P^^^ = 2^ A 

= a'{l + l/Q±2y^). (1) 

cr^ is the variance of the time series and 
A € [Xmin, Xmax]- Apart from being a limiting 
result, this expression is derived for Gaussian 
series. As it turns out the Gaussian assump- 
tion is not critical, at least not for the right 
limit A™"^, which is the one of interest for 
this study. We show in a subsequent section 
a simulation result confirming this observa- 
tion. 

A further consideration is the fact that the 
parameters Q = T/N and cr change every 
month^, as both the number of hour intervals 
and the number of institutions vary. Conse- 
quently the predicted eigenvalue density un- 
der the null changes from month to month. In 
principle we should construct a separate test 
for each month, comparing the eigenvalues of 
a particular month with the null distribution 
using the appropriate value of Q = T/N and 
a. However, monthly Q and cr do not vary too 
much, and the variation does not change the 
functional form of the null hypothesis sub- 
stantially. In view of the fact that Eq. [T] is 
valid only in the limit in any case, we pool 
eigenvalues for all months together, construct 



(7 is calculated mechanically using the standard for- 
mula, as if the time series had a continuous density 
function rather than discrete ternary values. 



a density estimate and compare it with the 
null using the monthly averages of Q and a. 

Figure [3] shows the empirical eigenvalue 
density compared with the expected density 
under the null for the stock Vodafone. The 
top figure shows on-book trading, while the 
bottom figure shows off-book trading. In 
both markets there are a number of eigen- 
values larger than the cutoff A™'*^ and not 
consistent with uncorrelated time-series. The 
eigenvalues are larger in the off-book market 
because the correlation matrices are larger 
(there are more traders active in the off- 
book market than the on-book market). The 
largest eigenvalue in the off-book market is 
5 times the noise cutoff whereas it is only 2 
times the cutoff in the on-book market. Sim- 
ilar results are found on other stocks as well. 

B. Bootstrapping the largest 
eigenvalues 

The weaknesses of the parametric eigen- 
value test can be remedied by focusing only 
on the largest eigenvalues and making a boot- 
strap test. For each month we construct a re- 
alization of the null hypothesis by randomly 
shuffling buy and sell periods for each insti- 
tution. In this way we obtain a bootstrapped 
strategy matrix M in which the institutions' 
strategies are uncorrelated, while preserving 
the number of buying, selling or inactivity 
periods for each institution. The shuffling 
therefore preserves the marginal correlations 
between strategies, meaning that long term 
(monthly or longer) correlations between in- 
stitutions are not altered. 

The shuffling also removes serial correla- 
tion in each institution's strategy. For most 
institutions this is not a problem because 
they do not display autocorrelations in the 
first place. However, for the group of institu- 
tions that do show autocorrelated strategies, 
this can be an issue. 

From the bootstrapped strategy matrix 
we calculate the correlation matrix and the 
eigenvalues. This is repeated for each month 
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FIG. 3: Empirical density of eigenvalues of the correlation matrices (red) compared to the theo- 
retical density for a random matrix (blue). We see that there are many eigenvalues not consistent 
with the hypothesis of a random matrix. 



separately 1000 times. 

As already noted, instead of looking at 
the significance of all empirical eigenvalues, 
we focus only on the largest two eigenvalues 
for each month. Correspondingly, we com- 
pare them with the null distribution of the 



two largest eigenvalues for each month: From 
each of the 1000 simulated correlation matri- 
ces, we keep only the largest two. We are 
therefore comparing the empirical largest two 
eigenvalues with an ensemble of largest eigen- 
values from the 1000 simulated correlation 
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matrices that correspond to the null appro- 
priate to that month. In this way the vari- 
ations of Q and cr, as well as small sample 
properties, are taken into account in the test. 

Figure |4] shows the results for all 32 
months of trading in Vodafone. Again, the 
top figure shows the monthly eigenvalues for 
the on-book market while the bottom figure 
for the off-book market. The largest empir- 
ical monthly eigenvalues are shown as blue 
points. They are to be compared with the 
blue vertical error bars which represent the 
width of the distribution of the maximum 
eigenvalues under the null. The error bars 
are centered at the median and represent two 
standard deviations of the underlying distri- 
bution. Since the distribution is relatively 
close to a normal, the width represents about 
96% of the density mass. The analogous 
red symbols show the second largest eigen- 
value for each month. We first note that the 
median of the distribution of the maximum 
eigenvalue under the random null fluctuates 
roughly between 2.4 and 2.5. These values 
are not so different from X^°-^ = 2.5 which 
we used in the parametric test. It even seems 
that in small samples and with ternary data, 
the tendency of is to decline as the num- 
ber of points used decreases, further strength- 
ening the parametric test. The same conclu- 
sion can be drawn by looking at the off-book 
market. 

The largest eigenvalue is significant in all 
months on both markets. Corroborating the 
parametric test, the largest eigenvalues on 
the off-book market are relatively further 
away from the corresponding null than for 
the on-book market, confirming the observa- 
tion that the correlations are stronger for off- 
book trading. However, being stronger, they 
are perhaps of a more simple nature: The 
second largest eigenvalue is almost never sig- 
nificant for off-book trading, while on the on- 
book market it is quite often significant. 



C. Clustering of trading behaviour 

The existence of significant eigenvalues al- 
lows us to use the correlation matrix as a 
distance measure in the attempt to classify 
institutions into groups of similar or dis- 
similar trading patterns. We apply clus- 
tering techniques using a metric chosen so 
that two strongly correlated institutions are 
'close' and anti-correlated institutions are 'far 
away'. A functional form fulfilling this re- 
quirement and satisfying the properties of be- 
ing a metric is [2] 

di,j = ^J2■{l-pi,), (2) 

where pij is the correlation coefficient be- 
tween strategies i and j. We have tried sev- 
eral reasonable modifications to this form but 
without obvious differences in the results. Ul- 
timately the choice of this metric is influenced 
by the fact that it has been successfully used 
in other studies [2J. We use complete linkage 
clustering, in which the distance between two 
clusters is calculated as the maximum dis- 
tance between its members. We also tried us- 
ing minimum distance (called "single linkage 
clustering"), which produced clusters similar 
to minimal spanning trees but without obvi- 
ous benefits^. 

The first benefit of creating a clustering 
is to rearrange the columns of the corre- 
lation matrix according to cluster member- 
ship. In the top part of figure |2] we already 
showed the rearranged correlation matrix for 
off-book trading in Vodafone for May 2000. 
In the bottom part is the corresponding den- 
drogram. In the correlation matrix one no- 
tices a highly correlated large group of insti- 
tutions as the red block of the matrix. One 
also notices a smaller number of institutions 
with strategies that are anti-correlated with 



^ We have also constructed minimal spanning trees 
from the data but without an obvious interpreta- 
tion. 
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FIG. 4: Largest eigenvalues of the correlation matrix over the 32 months for the stock Vodafone. 
The top figure is for on-book trading, the bottom for off-book trading. Blue points represent the 
largest empirical eigenvalues and are to be compared with the blue error bars which denote the null 
hypothesis of no correlation. Red points are the second largest eigenvalues and are to be compared 
with the red error bars. The error bars are centered at the median and and correspond to two 
standard deviations of the distribution of largest monthly eigenvalues under the null 



the large group. These institutions in turn 
are correlated among themselves. Finally, to 
the right part of the matrix there is a group 
of institutions that is weakly correlated with 
both of the previous two. These basic obser- 
vations are also confirmed in the clustering 
dendrogram - the dendrogram is plotted so 



that the institutions in the correlation matrix 
correspond to the institutions in the dendro- 
gram. Cutting the dendrogram at height 1.7 
for example, we recover the two main clusters 
consisting of the correlated red and the anti- 
correlated blue institutions. Cutting the den- 
drogram at a finer level, say just below 1.6, 
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FIG. 5: Correlation matrix and the clustering dendrograms for on-book trading in VOD in Novem- 
ber 2000. The correlated and anti-correlated groups of institutions are easily identifiable, however, 
for this month, the clustering algorithm does not properly classify the institutions at the top clus- 
tering level. We have added lines to help guide the eye to perhaps a better clustering than the 
algorithm came up with. It seems that the leftmost group of correlated institutions should have 
been clustered together with the rest of the correlated institutions. 



we also recover the weakly correlated cluster 
of institutions. The structure of the dendro- 
gram below 1.4 is suppressed for clarity of 
the figure, as those levels of detail are noisy. 
For other months and other stocks we ob- 
serve very similar patterns. The top cluster- 
ing level typically will classify institutions as 
a larger correlated group and a smaller anti- 



correlated group. 

The clustering for the on-book market is 
similar, though weaker. Figure [5] shows the 
correlation matrix and the clustering dendro- 
gram for the on-book trading for the same 
month and stock as the example we showed 
earlier in figure [2] We again see corre- 
lated and anti-correlated groups of institu- 
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tions, as well as the weakly correlated group. 
The clustering algorithm in this case, how- 
ever, does not select the correlated and anti- 
correlated groups at the top level of the clus- 
tering, selecting rather the weakly correlated 
group in one cluster and the other two in the 
other. At a finer level of clustering (lower 
height in the dendrogram) the three groups 
are clustered separately. The clustering algo- 
rithm and the distance metric we currently 
use may not be optimal in selecting the insti- 
tutions into clusters, but there is indication 
that the clustering makes sense. In any case, 
the existence of clusters of institutions based 
on the correlation in their strategies suggests 
that it may be possible to develop a taxon- 
omy of trading strategies. 



D. Time persistence of correlations 

Time persistence, when it is possible to in- 
vestigate it, offers a fairly robust and strong 
test for spuriousness. If a correlation is spu- 
rious it is not likely to persist in time. In 
contrast, if the correlations are persistent 
than the clusters of institutions also persist 
in time. As noted before, the LSE rescram- 
bles the codes assigned to the institutions at 
the turn of each month. It is therefore not 
possible to simply track the correlations be- 
tween institutions in time. Fortunately, there 
is a partial solution to this problem. By ex- 
ploiting other information in the dataset we 
are able to unscramble the codes over a few 
months in a row for some institutions. Unfor- 
tunately, the method works only for trading 
on the on-book market and typically does not 
work for institutions that do not trade fre- 
quently^. Therefore, the results reported in 



^ In the LSE data we use each order submitted to 
the hmit order book is assigned a unique identi- 
fier. This identifier allows us to track an order in 
the book and all that happens to it during its his- 
tory. If at the turn of the month (the scrambling 
period) an institution has an order sitting in the 



this section concern only the on-book market 
and are based mostly on more active insti- 
tutions. Since the correlations are typically 
stronger in the off-book market, we believe 
the results shown here would hold also for 
the off-book market, and perhaps be even 
stronger. 

Given the problems with tracking institu- 
tions in time we focus only on persistence up 
to two months. To form a dataset we seek all 
pairs of institution codes that are present at 
the market for two months in a row. For all 
such pairs we compare the correlation in the 
first of the two months C\ to the correlation in 
the second of the two months C2. If the cor- 
relation between two institutions was high in 
the first month, we estimate how likely is it 
that it will be high in the second month as 
well by calibrating a simple linear regression 

C2 = a + /5 ■ ci e, (3) 

assuming e to be i.i.d. Gaussian. For the 
stock VOD we identify 7246 linkable consec- 
utive pairs, for AZN 1623, for LLOY 1930 
and for AAL 640. All the regressions are well 
specified - the residuals are roughly normal 
and i.i.d. The regression results for the on- 
book market are summarized in table [B All 
stocks show significant and positive slope co- 
efficients with W' around 5%. Correlated in- 
stitutions tend to stay correlated, though the 
relationship is not strong. 



book, we can connect the institution codes asso- 
ciated with the order before and after the scram- 
bling. For example, if an order coded AT82F31E13 
was submitted to the book on the 31st by institu- 
tion 2331, and that same order was then canceled 
on the 1st by institution 4142, we know that the in- 
stitution that was 2331 was recoded as 4142. This 
typically allows us to link the codes for most ac- 
tive institutions for many months in a row, and in 
several cases even for the entire 32 month period. 
The LSE has indicated that they do not mind us 
doing this, and has since provided us with the in- 
formation we need to unscramble all the codes. 
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TABLE I: Regression results of equation |3] for correlations between institutions for two consecutive 
months. Significant slope coefficients show that if two institutions' strategies were correlated in 
one month, they are likely to be correlated in the next one as well. The table does not contain 
the off-book market because we cannot reconstruct institution codes for the off-book market in the 
same way as we can for the on-book market. The it values are the standard error of the coefficient 
estimate and the values in the parenthesis are the standard p-values. 



On-book market 


Stock 


Intercept 


Slope 




AAL 
AZN 
LLOY 
VOD 


-0.010 ± 0.004 (0.02) 
-0.01 ± 0.003 (0.00) 
0.003 ± 0.003 (0.28) 
0.008 ± 0.001 (0.00) 


0.25 ± 0.04 (0.00) 
0.14 ± 0.03 (0.00) 
0.23 ± 0.02 (0.00) 
0.17 ± 0.01 (0.00) 


0.061 
0.019 
0.053 
0.029 



Another sign of persistence is if an insti- 
tution gets consistently clustered in a given 
cluster. If two institutions tend to be clus- 
tered in a given cluster more often than ran- 
dom then we can infer that the cluster is 
meaningful. For this purpose we must have a 
way to distinguish the clusters by some prop- 
erty. A visual examination of many correla- 
tion matrices and dendrograms makes it clear 
that it is often the case that the two top level 
clusters are typically of quite different sizes. 
It seems natural to call them the majority 
and the minority cluster. Even though it was 
not always the case, the number of members 
in the two top clusters differed by a large 
number more often than not. Acknowledg- 
ing that this may not be a very robust dis- 
tinguishing feature, we choose it simple 
means to distinguish the main clusters. 



The probability that an institution would 
randomly be clustered in the minority a given 
number of times is analogous to throwing a 
biased coin the same number of times, with 
the bias being proportional to the ratio of 
the sizes of the two clusters. If the probabil- 
ity for being in the minority was a constant 
p throughout the K months, the expected 
number of times x an institution would ran- 
domly end up in the minority would be de- 



scribed by a binomial distribution 

B{x,p,K)=(j^y^{l-p)'-^. (4) 

In our case, however, the probability of being 
in the minority is not a constant, but varies 
monthly with the number of active institu- 
tions and the size of the minority. If the 
size of the minority is half the total number 
of institutions, the probability of ending in 
the minority by chance is 1/2. If the size of 
the minority is very small compared to the 
number of total institutions, the probability 
of ending in that cluster by chance is conse- 
quently very small. Denoting by Uk the num- 
ber of active institutions in month k and by 
fik the number of institutions in the minority 
cluster, then the probability for an institu- 
tion to be in minority for month k by chance 
is pk = jJ'k/^k- The expected number of times 
for an institution to be in the minority by 
chance is then 

P{x,Pk,K)= H Pk n (l-P^O, (5) 
fcemin fcemaj 

where k indexes the months in which the 
institution was in the majority or minority. 
A further complication is that not all insti- 
tutions are active on the same months, so 
that the probability density differs from in- 
stitution to institution: Depending on which 
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months the institution was active, the above 
product picks out the corresponding proba- 
bihties pk- Because of this comphcation we 
calculate the probability density for each in- 
stitution through a simulation. We simply 
pick out the months the institution was ac- 
tive, for each month draw a trial randomly 
according to pk, and calculate the number of 
times the trial was successful, i.e., that the in- 
stitution ended up in the minority. Repeating 
this many times we get the full distribution 
function for the number of times the institu- 
tion can end up in the minority at random 
for each institution. 

TABLE II: Result of the test on minority mem- 
bers for on-book trading in Vodafone. In bold 
are institutions whose behavior is not consistent 
with the hypothesis of random behavior. 



Inst. 

code 


Times in Out of 
minority possible 


Prob. of non- 
-random behavior 


3265 


16 


32 


0.99 


2548 


7 


32 


0.14 


2575 


6 


32 


0.07 


2533 


3 


19 


0.11 


2040 


14 


31 


0.97 


1720 


9 


20 


0.93 


1876 


5 


14 


0.73 


2688 


8 


30 


0.34 


1776 


11 


22 


0.99 


2086 


9 


23 


0.86 


0867 


10 


22 


0.95 


2995 


12 


20 


1.00 


2569 


7 


21 


0.64 



Similarly, because we are using the insti- 
tution codes over intervals of more than one 
month, we can perform this test only for insti- 
tutions on the on-book market. We limit the 
test to the stock Vodafone and apply it only 
on institutions that we can track for more 
than 12 out of the 32 months. This results 
in 13 institutions on which we base the test. 
For other stocks we are not able to track in- 



stitutions for long periods and the power of 
the test would be weak. 

The results for the 13 institutions are 
given in table |Tl] The leftmost column is the 
institution code, followed by the number of 
times that institution has been in the mi- 
nority. The column named 'Out of possible' 
counts the number of months an institution 
has been present in the market - it is the max- 
imum number of times it could have been in 
the minority. Finally, the rightmost column 
gives one minus the probability that the in- 
stitution could have randomly been so many 
times in the minority. We choose to display 
one minus the probability as it represents the 
probability of accepting the hypothesis that 
the behavior of that institution is not consis- 
tent with the random null hypothesis. Most 
institutions have quite high probabilities of 
non-random behavior and in bold we select 
the institutions which pass the test at the 
5% level. Out of 13 institutions, 5 of them 
have been in the minority cluster more often 
than they would have been just by chance at 
the 5% acceptance level. This is substantially 
higher than the expected number of 0.65 out 
of 13 tested at this acceptance level. 

III. CONCLUSIONS 

We have shown that even very crude def- 
initions of institutions' strategies defined on 
intervals of an hour period produce signifi- 
cant and persistent correlations. On the off- 
book market these correlations are organized 
in a way that there is typically a small group 
of institutions anti-correlated with a larger 
second group. The strategies within the two 
groups are correlated. Clustering analysis 
also clearly reveals this structure. The vol- 
ume transacted by the smaller group, typi- 
cally containing no more than 15 institutions 
on Vodafone, accounts for about half of the 
total trading volume. The larger group, typ- 
ically of around 80 institutions on Vodafone, 
transacts the remaining half of the total trade 
volume. This is an indication that the smaller 
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group can be identified as the group of deal- 
ers on the off-book market. They provide hq- 
uidity for the larger group of institutions and 
their strategies are anti-correlated: the deal- 
ers buy when the other institutions are sell- 
ing and vice versa. The single large monthly 
eigenvalue in the off-book market is related 
to this basic dynamics. 

Contrary to the off-book market, the on- 
book market does not display only one large 
eigenvalue. There are typically one or two 
significant eigenvalues for each month. The 
eigenvalues are relatively smaller and the 
correlations not as strong. Still, we are 
able to identify the basic clustering structure 
seen on the off-book market, namely a small 
and large group of anti-correlated strategies. 
However, the volume traded by the small 
cluster does not seem to equal the volume 
of the large cluster. The dynamics seems to 
be more complicated. The largest eigenvalue 
may still be related to transactions between 
the two clusters of institutions, however the 
occasional second largest eigenvalue suggests 



that there is more complicated dynamics tak- 
ing place. 

These results suggest that trading on the 
LSE is a relatively structured process in the 
aspect of trading strategies. At a given time, 
there are groups of institutions all trading in 
the same direction, with other groups trading 
in the opposite direction, providing liquidity. 

It is important to stress that what we have 
conveniently labeled a "strategy" is more typ- 
ically a collection of strategies all being exe- 
cuted by the same member of the exchange. 
Prom this point of view it is particularly re- 
markable that we observe heterogeneity, as 
it depends on the tendency of certain types 
of strategies to execute through particular 
members of the exchange (or in some cases 
that pure strategies take the expense to pur- 
chase their own membership). One expects 
that if we were able to observe actual account 
level information we would see much cleaner 
and stronger similarities and differences be- 
tween strategies. 
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